{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Temporal-aware recommendation.ipynb",
      "version": "0.3.2",
      "provenance": [],
      "private_outputs": true,
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "[View in Colaboratory](https://colab.research.google.com/github/ylongqi/openrec/blob/master/tutorials/Temporal_aware_recommendation.ipynb)"
      ]
    },
    {
      "metadata": {
        "id": "ny0FSG4batHE",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "<p align=\"center\">\n",
        "  <img src =\"https://recsys.acm.org/wp-content/uploads/2017/07/recsys-18-small.png\" height=\"40\" /> <font size=\"4\">Recsys 2018 Tutorial</font>\n",
        "</p>\n",
        "<p align=\"center\">\n",
        "  <font size=\"4\"><b>Modularizing Deep Neural Network-Inspired Recommendation Algorithms</b></font>\n",
        "</p>\n",
        "<p align=\"center\">\n",
        "  <font size=\"4\">Hands on: Temporal-aware recommendations</font>\n",
        "</p>"
      ]
    },
    {
      "metadata": {
        "id": "EpOmkPCImixl",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Install OpenRec library\n",
        "---\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "ph2bOzItmbcf",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "!pip install openrec"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "GWvu2bKr8UdQ",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "\n",
        "RNN-based temporal recommendation model (RNN-Rec)\n",
        "---\n",
        "The RNN-Rec model takes as inputs a list of items a user consumed and predicts the item that the user is likely to consume next.\n",
        "\n",
        "<p align=\"center\">\n",
        "  <img src =\"https://s3.amazonaws.com/cornell-tech-sdl-openrec/tutorials/rnn_rec.png\" height=\"200\" />\n",
        "</p>\n",
        "\n",
        "To implement such a model using openrec.tf1. One needs to first decide how this recommender should be decomposed into subgraphs, i.e., **inputgraph**, **usergraph**, **itemgraph**, **interactiongraph** and **optimizergraph**. For example, the training graph of RNN-Rec can be decomposed as follows.\n",
        "\n",
        "\n",
        "<p align=\"center\">\n",
        "  <img src =\"https://s3.amazonaws.com/cornell-tech-sdl-openrec/tutorials/rnn_rec_module.png\" height=\"200\" />\n",
        "</p>\n",
        "\n",
        "* **inputgraph**: item consumption history and the groundtruth label.\n",
        "* **usergraph**: left as empty as no user-specific latent factor is needed.\n",
        "* **itemgraph**: extract latent factors for items.\n",
        "* **interactiongraph**: uses RNN and softmax to model user-item interactions.\n",
        "\n",
        "After defining subgraphs, their interfaces and connections need to be specified. A sample specification of RNN-Rec can be as follows.\n",
        "\n",
        "<p align=\"center\">\n",
        "  <img src =\"https://s3.amazonaws.com/cornell-tech-sdl-openrec/tutorials/interface_connections.png\" height=\"200\" />\n",
        "</p>\n",
        "\n",
        "The serving graph of RNN-Rec needs to be defined similarly."
      ]
    },
    {
      "metadata": {
        "id": "1r5E55hVGCcH",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Define a RNN recommender in OpenRec\n",
        "---\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "GBwFOTu8GBqW",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "from openrec.tf1.recommenders import VanillaYouTubeRec\n",
        "from openrec.tf1.modules.interactions import RNNSoftmax\n",
        "import tensorflow as tf\n",
        "\n",
        "def RNNRec(batch_size, dim_item_embed, max_seq_len, total_items, num_units, l2_reg=None,\n",
        "    init_model_dir=None, save_model_dir='Recommender/', train=True, serve=False):\n",
        "    \n",
        "    ## By default reuse everything from VanillaYoutubeRec\n",
        "    rec = VanillaYouTubeRec(batch_size=batch_size,\n",
        "                            dim_item_embed=dim_item_embed,\n",
        "                            max_seq_len=max_seq_len,\n",
        "                            total_items=total_items,\n",
        "                            l2_reg_embed=l2_reg,\n",
        "                            init_model_dir=init_model_dir, \n",
        "                            save_model_dir=save_model_dir, \n",
        "                            train=train, serve=serve)\n",
        "    \n",
        "    ## [TODO] Please define input ports (Hint: using \"ins\" parameter)\n",
        "    ## @rec.traingraph.interactiongraph(ins=[?FILL IN?])\n",
        "    ## Answer:\n",
        "    @rec.traingraph.interactiongraph(ins=['seq_item_vec', 'seq_len', 'label'])\n",
        "    def f(subgraph):\n",
        "        RNNSoftmax(seq_item_vec=subgraph['seq_item_vec'], \n",
        "                   seq_len=subgraph['seq_len'], \n",
        "                   num_units=num_units, \n",
        "                   total_items=total_items, \n",
        "                   label=subgraph['label'], \n",
        "                   train=True, \n",
        "                   subgraph=subgraph, \n",
        "                   scope='RNNSoftmax')\n",
        "    \n",
        "    ## [TODO] Please define input ports (Hint: using \"ins\" parameter)\n",
        "    ## @rec.servegraph.interactiongraph(ins=[?FILL IN?])\n",
        "    ## Answer:\n",
        "    @rec.servegraph.interactiongraph(ins=['seq_item_vec', 'seq_len'])\n",
        "    def f(subgraph):\n",
        "        RNNSoftmax(seq_item_vec=subgraph['seq_item_vec'], \n",
        "                   seq_len=subgraph['seq_len'],\n",
        "                   num_units=num_units, \n",
        "                   total_items=total_items, \n",
        "                   train=False, \n",
        "                   subgraph=subgraph, \n",
        "                   scope='RNNSoftmax')\n",
        "    \n",
        "    return rec"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "ulH89eiY1b_i",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "Training and testing the RNN-Rec model\n",
        "---\n",
        "* Download LastFM dataset\n",
        "\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "X2xOW5ki9M8v",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "import requests\n",
        "\n",
        "dataset_prefix = 'http://s3.amazonaws.com/cornell-tech-sdl-openrec'\n",
        "r = requests.get('%s/lastfm/lastfm_test.npy' % dataset_prefix)\n",
        "open('lastfm_test.npy', 'wb').write(r.content)\n",
        "r = requests.get('%s/lastfm/lastfm_train.npy' % dataset_prefix)\n",
        "open('lastfm_train.npy', 'wb').write(r.content)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "Gg3_KKmf1tHw",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "\n",
        "* Load LastFM dataset\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "sNpoN_9N2GZj",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "import numpy as np\n",
        "\n",
        "lastfm_train = np.load('lastfm_train.npy')  # The structured Numpy array for training\n",
        "lastfm_test = np.load('lastfm_test.npy')    # The structured Numpy array for testing\n",
        "\n",
        "total_users = 992     # Total number of users in the dataset\n",
        "total_items = 14598   # Total number of items in the dataset"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "5Qd-HeJQ3i53",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "* Inspect LastFM dataset"
      ]
    },
    {
      "metadata": {
        "id": "biWo1pXb3Xjn",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "print('Keys in LastFM training data:', lastfm_train.dtype.names)\n",
        "print('Number of training records:', len(lastfm_train))\n",
        "print('Keys in LastFM testing data:', lastfm_test.dtype.names)\n",
        "print('Number of testing records:', len(lastfm_test))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "37QhXo704uk7",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "* Set values for hyperparameters of the RNN model\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "HQmojKQXZ0Rw",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "dim_item_embed = 50       # Dimensionality of the item embedding\n",
        "max_seq_len = 100         # Maximum sequence length used for prediction\n",
        "num_units = 32            # Number of units in the RNN model\n",
        "batch_size = 256          # Training batch size\n",
        "total_iter = 1000         # Total number of training iterations\n",
        "eval_iter = 100           # Evaluate the model every eval_iter iterations\n",
        "save_iter = total_iter    # Save the model every total_iter iterations"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "REjN2-J25ggB",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "* Initiate datasets and data samplers for training and testing\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "XBxezXZ96LrH",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "from openrec.tf1.utils import Dataset\n",
        "from openrec.tf1.utils.samplers import TemporalSampler\n",
        "from openrec.tf1.utils.samplers import TemporalEvaluationSampler\n",
        "\n",
        "train_dataset = Dataset(raw_data=lastfm_train, \n",
        "                        total_users=total_users,\n",
        "                        total_items=total_items, \n",
        "                        sortby='ts', name='Train')\n",
        "# \"sortby\" keyword is used to sort records based on timestamp\n",
        "\n",
        "test_dataset = Dataset(raw_data=lastfm_test, \n",
        "                       total_users=total_users,\n",
        "                       total_items=total_items, \n",
        "                       sortby='ts', name='Test')\n",
        "# \"sortby\" keyword is used to sort records based on timestamp\n",
        "\n",
        "train_sampler = TemporalSampler(batch_size=batch_size, \n",
        "                                max_seq_len=max_seq_len, \n",
        "                                dataset=train_dataset, \n",
        "                                num_process=1)\n",
        "test_sampler = TemporalEvaluationSampler(dataset=test_dataset, \n",
        "                                         max_seq_len=max_seq_len)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "ml2REAQ0Ejro",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "* Instantiate a recommender and a corresponding model trainer\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "Ym9aLum4aBU5",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "from openrec import ModelTrainer\n",
        "\n",
        "rnn_model = RNNRec(batch_size=batch_size, \n",
        "                   dim_item_embed=dim_item_embed, \n",
        "                   max_seq_len=max_seq_len, \n",
        "                   total_items=train_dataset.total_items(), \n",
        "                   num_units=num_units, \n",
        "                   save_model_dir='rnn_recommender/', \n",
        "                   train=True, serve=True)\n",
        "\n",
        "model_trainer = ModelTrainer(model=rnn_model)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "XaYGQoLkExxi",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "* Define evaluators to be used for testing\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "z7i7AHEhEa4K",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "from openrec.tf1.utils.evaluators import AUC, Recall\n",
        "\n",
        "auc_evaluator = AUC()\n",
        "recall_evaluator = Recall(recall_at=[100, 500]) "
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "PFYWm7HzE6dd",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "* Start Training\n",
        "\n"
      ]
    },
    {
      "metadata": {
        "id": "R9pj-el-acFO",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "model_trainer.train(total_iter=total_iter, \n",
        "                    eval_iter=eval_iter, \n",
        "                    save_iter=save_iter, \n",
        "                    train_sampler=train_sampler, \n",
        "                    eval_samplers=[test_sampler], \n",
        "                    evaluators=[auc_evaluator, recall_evaluator])"
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}